Skip to content

Conversation

mbagherbeikTT
Copy link

LICM pass has been made more aggressive when evaluating what ops it can move out of loop regions.

Ops with MemWrite memory effects will now be moved out of loops if:

  1. op is speculatable
  2. inputs to op are defined outside of the loop that op resides in
  3. the op only has MemWrite effects
  4. no other operation in op's parent region has an Alloc/Free/Write effect on any of the resources written to by op
  5. no other operation with a Read effect on any of the resources written to by op dominates op in its parent region

…t and modified LICM pass. Allows speculatable ops with 'Init' Memory Effects to be moved out of loops if op does not have other, non-Init, Memory Effects and no other operations within it's nested region(s) have Memory Effects that apply to the same resources as the original op.
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@joker-eph
Copy link
Collaborator

I'm wary of something like:

store
for i = 0 -> 0: // never execute
  store
load

I suspect you'll move the store here, changing the value that the load is seeing.

@mbagherbeikTT
Copy link
Author

fair point. yes that will get moved out. The current LICM pass doesn't account for dead loops either and will move out the op if it has no mem effects AND is speculatable --> but since there's no effects it'll probably get removed by DCE whereas now the memWrites will force it to remain.

I'm looking through the LoopLikeOpInterface to see if we can infer dead loops. Might end up being restricted to only loops with known/constant iterator ranges. Let me know if you have any suggestions for where else to look for a solution.

@mbagherbeikTT mbagherbeikTT marked this pull request as ready for review August 26, 2025 21:03
@mbagherbeikTT
Copy link
Author

Added more conditions. Ops with MemWrite memory effects will now be moved out of loops if:

  1. op is speculatable
  2. op's parent loop region has constant bounds/steps
  3. op's parent loop isn't dead (is not a zero trip loop)
  4. inputs to op are defined outside of the loop that op resides in
  5. the op only has MemWrite effects
  6. no other operation in op's parent region has an Alloc/Free/Write effect on any of the resources written to by op
  7. no other operation with a Read effect on any of the resources written to by op dominates op in its parent region

@joker-eph
Copy link
Collaborator

Can you elaborate on why you chose to target writes instead of loads?

@mbagherbeikTT
Copy link
Author

mbagherbeikTT commented Aug 27, 2025

if you mean "why writes are movable but not reads (loads)": this is a partial implementation to make sure we're on the same page in terms of implementation details. If yes, then adding similar functionality for moving reads and even taking into account stages so an op can do neat things like make itself immovable by adding MemRead<rA, 0> and MemWrite<rA, 1> while another op can still be moved if the stages are reversed.

if you did mean loads, please elaborate as I'm not sure what you mean, specifically.

@mbagherbeikTT
Copy link
Author

Alrighty. It looks like the code changes so far are relatively up to standards. Mehdi provided a lot of useful feedback that I'll be addressing. I'll spend time over this weekend to make the major changes like adding read and stage support as well as making the walk more efficient.

If the maintainers (@joker-eph) prefer a more incremental update approach let me know.

As always, thanks again for spending time to look into this with me and the rest of the community.

@mbagherbeikTT
Copy link
Author

mbagherbeikTT commented Sep 5, 2025

Significantly changed LICM pass

  • reduced complexity to O(n)
  • I had to remove the isZeroTrip() check for now as that ended up being a much deeper rabbit hole than initially thought (e.g. have to use different functions to get bound/step information from loopLikeOpInterface of "affine.for" and "scf.for")
  • pass first maps out if sequence of Memory Effects on resources result in conflicts on each resource
  • Op can be LICM’d if:
    • isSpeculatable()
    • AND all of the op's Memory Effect resources are conflict free within the loop under analysis
  • A resource has a conflict within the loop under analysis if any of the following occur within the loop:
    • Within an op that takes a loop-variant input: MemWrite on the resource (input could be data source for write)
    • Within an op: a MemRead on any resource precedes a MemWrite on a resource (read data could be source for write)
    • MemAlloc or MemFree on the resource by any op
    • MemRead on the resource by one op that is followed by another op with a MemWrite on that resource

Looking forward to the feedback as this is a pretty wide departure from the previous method

@bondhugula bondhugula self-requested a review September 11, 2025 03:57
}
func.return %final : vector<4x4xf32>
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many of the test cases in https://github.com/llvm/llvm-project/blob/main/mlir/test/Dialect/Affine/affine-loop-invariant-code-motion.mlir may be useful for these changes. We can delete the affine-licm pass if it's subsumed by LICM post this change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into the affine-LICM and currently the passes aren't quite the same, as the affine version doesn't check for speculability and most of the affine ops don't have the speculation interface which LICM requires. The ops require some modification to use this pass; it may be as simple as adding "AlwaysSpeculatable" to relevant ops but I can't speak to that with confidence.

Copy link

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff origin/main HEAD --extensions cpp,h -- mlir/include/mlir/Interfaces/SideEffectInterfaces.h mlir/include/mlir/Transforms/LoopInvariantCodeMotionUtils.h mlir/lib/Interfaces/SideEffectInterfaces.cpp mlir/lib/Transforms/Utils/LoopInvariantCodeMotionUtils.cpp mlir/test/lib/Dialect/Test/TestOps.h

⚠️
The reproduction instructions above might return results for more than one PR
in a stack if you are using a stacked PR workflow. You can limit the results by
changing origin/main to the base branch/commit you want to compare against.
⚠️

View the diff from clang-format here.
diff --git a/mlir/include/mlir/Interfaces/SideEffectInterfaces.h b/mlir/include/mlir/Interfaces/SideEffectInterfaces.h
index e2b3ba10e..cb705a06e 100644
--- a/mlir/include/mlir/Interfaces/SideEffectInterfaces.h
+++ b/mlir/include/mlir/Interfaces/SideEffectInterfaces.h
@@ -383,7 +383,9 @@ getMemoryEffectsSorted(Operation *op);
 /// resource. An 'allocate' effect implies only allocation of the resource, and
 /// not any visible mutation or dereference.
 struct Allocate : public Effect::Base<Allocate> {
-  Allocate() : Effect::Base<Allocate>() { this->priority = Priority::kAllocPriority; }
+  Allocate() : Effect::Base<Allocate>() {
+    this->priority = Priority::kAllocPriority;
+  }
 };
 
 /// The following effect indicates that the operation frees some resource that
diff --git a/mlir/include/mlir/Transforms/LoopInvariantCodeMotionUtils.h b/mlir/include/mlir/Transforms/LoopInvariantCodeMotionUtils.h
index 82581efda..ef7c72e66 100644
--- a/mlir/include/mlir/Transforms/LoopInvariantCodeMotionUtils.h
+++ b/mlir/include/mlir/Transforms/LoopInvariantCodeMotionUtils.h
@@ -25,14 +25,16 @@ class Value;
 
 /// Gathers potential conflicts on all memory resources used within loop
 ///
-/// Given a target loop and an op within it (or the loop op itself), 
-/// gathers op's memory effects and flags potential resource conflicts 
-/// in a map and then recurses into the op's regions to gather nested 
-/// resource conflicts 
+/// Given a target loop and an op within it (or the loop op itself),
+/// gathers op's memory effects and flags potential resource conflicts
+/// in a map and then recurses into the op's regions to gather nested
+/// resource conflicts
 ///
 /// First call should use loop = someLoop and op = someLoop.getOperation()
-void gatherResourceConflicts(LoopLikeOpInterface loop, Operation *op,
-    DenseMap<TypeID, std::pair<bool, MemoryEffects::EffectInstance>> &resourceConflicts);
+void gatherResourceConflicts(
+    LoopLikeOpInterface loop, Operation *op,
+    DenseMap<TypeID, std::pair<bool, MemoryEffects::EffectInstance>>
+        &resourceConflicts);
 
 /// Given a list of regions, perform loop-invariant code motion. An operation is
 /// loop-invariant if it depends only of values defined outside of the loop.
diff --git a/mlir/lib/Interfaces/SideEffectInterfaces.cpp b/mlir/lib/Interfaces/SideEffectInterfaces.cpp
index ec8618e24..6603f9193 100644
--- a/mlir/lib/Interfaces/SideEffectInterfaces.cpp
+++ b/mlir/lib/Interfaces/SideEffectInterfaces.cpp
@@ -330,19 +330,19 @@ mlir::MemoryEffects::getMemoryEffectsSorted(Operation *op) {
 
   memInterface.getEffects(effectsSorted);
 
-  auto sortEffects = 
-    [](llvm::SmallVectorImpl<MemoryEffects::EffectInstance> &effects) {
-    llvm::stable_sort(effects, [](const MemoryEffects::EffectInstance &a,
-                                  const MemoryEffects::EffectInstance &b) {
-      if (a.getStage() < b.getStage())
-        return true;
-      
-      if (a.getStage() == b.getStage())
-        return a.getEffect()->getPriority() < b.getEffect()->getPriority();
-
-      return false; // b before a
-    });
-  };
+  auto sortEffects =
+      [](llvm::SmallVectorImpl<MemoryEffects::EffectInstance> &effects) {
+        llvm::stable_sort(effects, [](const MemoryEffects::EffectInstance &a,
+                                      const MemoryEffects::EffectInstance &b) {
+          if (a.getStage() < b.getStage())
+            return true;
+
+          if (a.getStage() == b.getStage())
+            return a.getEffect()->getPriority() < b.getEffect()->getPriority();
+
+          return false; // b before a
+        });
+      };
   sortEffects(effectsSorted);
 
   return effectsSorted;
@@ -352,12 +352,12 @@ bool mlir::isMemoryEffectFree(Operation *op) {
   if (auto memInterface = dyn_cast<MemoryEffectOpInterface>(op)) {
     if (!memInterface.hasNoEffect())
       return false;
-    
+
     // If the op does not have recursive side effects, then it is memory effect
     // free.
     if (!op->hasTrait<OpTrait::HasRecursiveMemoryEffects>())
       return true;
-    
+
   } else if (!op->hasTrait<OpTrait::HasRecursiveMemoryEffects>()) {
     // Otherwise, if the op does not implement the memory effect interface and
     // it does not have recursive side effects, then it cannot be known that the
diff --git a/mlir/lib/Transforms/Utils/LoopInvariantCodeMotionUtils.cpp b/mlir/lib/Transforms/Utils/LoopInvariantCodeMotionUtils.cpp
index 7b59b4abb..a2cbcc40e 100644
--- a/mlir/lib/Transforms/Utils/LoopInvariantCodeMotionUtils.cpp
+++ b/mlir/lib/Transforms/Utils/LoopInvariantCodeMotionUtils.cpp
@@ -60,18 +60,19 @@ static bool canBeHoisted(Operation *op,
       op, [&](OpOperand &operand) { return definedOutside(operand.get()); });
 }
 
-/// Merges srcEffect's Memory Effect on its resource into the 
+/// Merges srcEffect's Memory Effect on its resource into the
 /// resourceConflicts map, flagging resources if the srcEffect
 /// results in a conflict
-static void mergeResource(
-  DenseMap<TypeID, std::pair<bool, MemoryEffects::EffectInstance>> &resourceConflicts,
-  const MemoryEffects::EffectInstance &srcEffect,
-  bool srcHasConflict) {
+static void
+mergeResource(DenseMap<TypeID, std::pair<bool, MemoryEffects::EffectInstance>>
+                  &resourceConflicts,
+              const MemoryEffects::EffectInstance &srcEffect,
+              bool srcHasConflict) {
 
   TypeID srcResourceID = srcEffect.getResource()->getResourceID();
 
-  bool srcIsAllocOrFree = isa<MemoryEffects::Allocate>(srcEffect.getEffect())
-    || isa<MemoryEffects::Free>(srcEffect.getEffect());
+  bool srcIsAllocOrFree = isa<MemoryEffects::Allocate>(srcEffect.getEffect()) ||
+                          isa<MemoryEffects::Free>(srcEffect.getEffect());
 
   bool conflict = srcHasConflict || srcIsAllocOrFree;
 
@@ -79,7 +80,8 @@ static void mergeResource(
 
   // if it doesn't already exist, create entry for resource in map
   if (dstIt == resourceConflicts.end()) {
-    resourceConflicts.insert(std::make_pair(srcResourceID, std::make_pair(conflict, srcEffect)));
+    resourceConflicts.insert(
+        std::make_pair(srcResourceID, std::make_pair(conflict, srcEffect)));
     return;
   }
 
@@ -93,10 +95,10 @@ static void mergeResource(
   bool srcWrite = isa<MemoryEffects::Write>(srcEffect.getEffect());
   bool dstRead = isa<MemoryEffects::Read>(dstEffect.getEffect());
   bool readBeforeWrite = dstRead && srcWrite;
-  
+
   conflict = conflict || readBeforeWrite;
 
-  dstIt->second =std::make_pair(conflict, srcEffect);
+  dstIt->second = std::make_pair(conflict, srcEffect);
 }
 
 /// Returns true if any of op's OpOperands are defined outside of loopLike
@@ -113,14 +115,16 @@ static bool hasLoopVariantInput(LoopLikeOpInterface loopLike, Operation *op) {
 /// flagged as having a conflict within the resourceConflicts map
 /// (b) op doesn't have a MemoryEffectOpInterface or has one but
 /// without any specific effects
-static bool mayHaveMemoryEffectConflict(Operation *op,
-  DenseMap<TypeID, std::pair<bool, MemoryEffects::EffectInstance>> &resourceConflicts) {
+static bool mayHaveMemoryEffectConflict(
+    Operation *op,
+    DenseMap<TypeID, std::pair<bool, MemoryEffects::EffectInstance>>
+        &resourceConflicts) {
 
   auto memInterface = dyn_cast<MemoryEffectOpInterface>(op);
-  
+
   // op does not implement the memory effect op interface
   // shouldn't be flagged as movable to be conservative
-  if (!memInterface) 
+  if (!memInterface)
     return true;
 
   // gather all effects on op
@@ -128,7 +132,7 @@ static bool mayHaveMemoryEffectConflict(Operation *op,
   memInterface.getEffects(effects);
 
   // op has interface but no effects, be conservative
-  if (effects.empty()) 
+  if (effects.empty())
     return true;
 
   // RFC moving ops with HasRecursiveMemoryEffects that have nested ops
@@ -138,10 +142,10 @@ static bool mayHaveMemoryEffectConflict(Operation *op,
   for (const MemoryEffects::EffectInstance &effect : effects) {
     auto resourceID = effect.getResource()->getResourceID();
 
-    auto resConIt = resourceConflicts.find(resourceID); 
+    auto resConIt = resourceConflicts.find(resourceID);
     if (resConIt == resourceConflicts.end())
       return true; // RFC realistically shouldn't reach here but just in case?
-    
+
     bool hasConflict = resConIt->second.first;
     if (hasConflict)
       return true;
@@ -150,13 +154,15 @@ static bool mayHaveMemoryEffectConflict(Operation *op,
   return false;
 }
 
-void mlir::gatherResourceConflicts(LoopLikeOpInterface loopLike, Operation *op,
-  DenseMap<TypeID, std::pair<bool, MemoryEffects::EffectInstance>> &resourceConflicts) {
+void mlir::gatherResourceConflicts(
+    LoopLikeOpInterface loopLike, Operation *op,
+    DenseMap<TypeID, std::pair<bool, MemoryEffects::EffectInstance>>
+        &resourceConflicts) {
 
   if (auto memInterface = dyn_cast<MemoryEffectOpInterface>(op)) {
     // gather all effects on op
     llvm::SmallVector<MemoryEffects::EffectInstance> effects =
-      MemoryEffects::getMemoryEffectsSorted(op);
+        MemoryEffects::getMemoryEffectsSorted(op);
 
     if (!effects.empty()) {
       // any variant input to the op could be the data source
@@ -166,7 +172,7 @@ void mlir::gatherResourceConflicts(LoopLikeOpInterface loopLike, Operation *op,
       for (const MemoryEffects::EffectInstance &effect : effects) {
         bool conflict = false;
         bool isWrite = isa<MemoryEffects::Write>(effect.getEffect());
-        
+
         // all writes to a resource that follow a read on any other resource
         // have to be considered a conflict as guaranteeing that the read
         // is invariant and won't affect the write requires more robust logic
@@ -183,8 +189,8 @@ void mlir::gatherResourceConflicts(LoopLikeOpInterface loopLike, Operation *op,
   }
 
   for (Region &region : op->getRegions())
-    for (Operation &opInner : region.getOps()) 
-        gatherResourceConflicts(loopLike, &opInner, resourceConflicts);
+    for (Operation &opInner : region.getOps())
+      gatherResourceConflicts(loopLike, &opInner, resourceConflicts);
 }
 
 size_t mlir::moveLoopInvariantCode(
@@ -205,8 +211,10 @@ size_t mlir::moveLoopInvariantCode(
   // continuous region --> need to add fork checking
   //
   // loop "do" and "then" regions also merged
-  DenseMap<TypeID, std::pair<bool, MemoryEffects::EffectInstance>> resourceConflicts;
-  mlir::gatherResourceConflicts(loopLike, loopLike.getOperation(), resourceConflicts);
+  DenseMap<TypeID, std::pair<bool, MemoryEffects::EffectInstance>>
+      resourceConflicts;
+  mlir::gatherResourceConflicts(loopLike, loopLike.getOperation(),
+                                resourceConflicts);
 
   auto regions = loopLike.getLoopRegions();
   for (Region *region : regions) {
@@ -231,12 +239,12 @@ size_t mlir::moveLoopInvariantCode(
       LDBG() << "Checking op: "
              << OpWithFlags(op, OpPrintingFlags().skipRegions());
 
-      bool noMemoryConflicts = isMemoryEffectFree(op) 
-        || !mayHaveMemoryEffectConflict(op, resourceConflicts);
+      bool noMemoryConflicts =
+          isMemoryEffectFree(op) ||
+          !mayHaveMemoryEffectConflict(op, resourceConflicts);
 
-      if (!noMemoryConflicts
-        || !shouldMoveOutOfRegion(op, region)
-        || !canBeHoisted(op, definedOutside))
+      if (!noMemoryConflicts || !shouldMoveOutOfRegion(op, region) ||
+          !canBeHoisted(op, definedOutside))
         continue;
 
       LDBG() << "Moving loop-invariant op: " << *op;
@@ -260,9 +268,7 @@ size_t mlir::moveLoopInvariantCode(LoopLikeOpInterface loopLike) {
       [&](Value value, Region *) {
         return loopLike.isDefinedOutsideOfLoop(value);
       },
-      [&](Operation *op, Region *) {
-        return isSpeculatable(op);
-      },
+      [&](Operation *op, Region *) { return isSpeculatable(op); },
       [&](Operation *op, Region *) { loopLike.moveOutOfLoop(op); });
 }
 

if (effects.empty())
return true;

// RFC moving ops with HasRecursiveMemoryEffects that have nested ops
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFC? Is this a TODO?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of, yes. HasRecursiveMemoryEffects still works as it previously did through the isMemoryEffectFree() path but I wanted to discuss how to add support through this part of the path before implementing it.

The simplest approach for an op with this trait is to recurse down the op's regions and if all nested ops have the MemEffInterface and are conflict free, and the op is speculatable, then the op is movable. The part I'm fuzzy on is whether all nested ops should be required to be speculatable too or if that doesn't matter

func.func @move_single_resource_write_dominant() attributes {} {
%c0_i32 = arith.constant 0 : i32
%c1_i32 = arith.constant 10 : i32
%c2_i32 = arith.constant 1 : i32
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please name the variable according to the constant: reading the loops it is misleading right now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made my constant naming consistent with the stored values. On that note, I was looking for a way to arbitrarily tag ops, in this case the loops, in IR to use with the checker but couldn't find anything. I ended up using different constants for the loops' upper bounds so we can check exactly which loop was moved where instead. That should make the tests easier to understand.

%c0_i32 = arith.constant 0 : i32
%c1_i32 = arith.constant 10 : i32
%c2_i32 = arith.constant 1 : i32
%c0_i32_0 = arith.constant 0 : i32
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have a duplicated 0 here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

// CHECK: "test.test_effects_write_A"() : () -> ()

"test.test_effects_read_A"() : () -> ()
"test.test_effects_write_A"() : () -> ()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is W->R movable but not R->W?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R->W isn't movable since the value of the read will be different between iterations 1 and 2.

For the flipped W->R of this case, there are no inputs to write_A and it has no read effects so we can infer that the write data has to be a constant within the op.

The read_A op has no operands or results or write effects so I can see an argument here that the read value doesn't get used by anything and the op can be moved. Is that an assumption that we can safely make in these cases?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R->W isn't movable since the value of the read will be different between iterations 1 and 2.

Sure, but the sequence R->W is indempotent, isn't it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's safe to assume that, because test_effects_read_A() doesn't take any input and doesn't have a result, the read data "isn't used," It would be idempotent.

If that can't be assumed, then the conservative option is to flag it as a conflict on the resource.

If the assumption CAN be made, I can check if the read data "is used" when mapping conflicts.

// CHECK: "test.test_effects_write_EF"() : () -> ()
// CHECK: "test.test_effects_read_EF"() : () -> ()

// Both of these should be moved out of their parent
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of these what? Loops? Where is it checked?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in previous comments

%1 = arith.cmpi slt, %arg0, %c3_i32 : i32
scf.if %1 {
%c0_i32_0 = arith.constant 0 : i32
%c0_i32_1 = arith.constant 0 : i32
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove redundant constant please, we already have 0 available in this function.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

// CHECK: "test.test_effects_read_B"() : () -> ()
// CHECK: scf.for
// CHECK: scf.for
// CHECK: scf.for
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What this is checking isn't clear to me.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in previous comments

// CHECK: "test.test_effects_write_A"() : () -> ()
// CHECK: "test.test_effects_read_A"() : () -> ()

// Loop should be moved out of parent
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is it checked?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched to using different constants for the loops' upper bounds so we can check exactly which loop was moved where

}
else {
// CHECK: "test.test_effects_write_F"() : () -> ()
// CHECK: "test.test_effects_read_F"() : () -> ()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the intent for these checkes?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this particular check was to make sure that these ops aren't moved out of a non-loop region, and that their presence here will cause conflicts on resources that make ops in the regions above it unmovable


%input = arith.constant 7 : index
"test.test_effects_write_A_with_input"(%input) : (index) -> ()
"test.test_effects_read_A"() : () -> ()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't %input a loop invariant, and so why aren't all these moved out?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pass checks if the operands are defined inside/outside the loop. The arith.constant is moved out by LICM and a second LICM pass will also move the test ops. More complex logic for detecting if the op that defines the input in the loop is invariant or not could be added at some point

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pass checks if the operands are defined inside/outside the loop

LICM is a worklist driven algorithm, and when an operation is moved out, its users are reprocessed, so what's up here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the issue is that the conflicts are mapped out at loop level before we start moving things. After something is moved, the conflict map isn't recreated and a stale one is used for the rest of the operations. I added a fix by wrapping the main part of moveLoopInvariantCode() within a while loop where it'll repeatedly analyze the loop until no more moves are made. Let me know what your thoughts are

Copy link
Collaborator

@joker-eph joker-eph Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That may be reasonable considering the implementation is very iterative. A more efficient one may compute iteratively what can be moved and keep the state updated, but that's out-of-reach for now.

I am more curious about where this is all leading right now in practice, considering that the algorithm does not check for loop being executed at least once, and so is restricted to speculatable operations, which is not compatible with "memory write" in general.

We're gonna pay some compile-time, we should know it'll serve actual use-cases. I can't picture one where we use scf.for and memref load/store right now for example?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping here? That's probably one of my main questions at the moment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with both your points regarding dead loop checking and being restricted to speculatable operations. I misunderstood why the movable ops had to be both effect free and speculatable in the old LICM.

I think we need to split up the condition so there's 2 separate checks:

  • shouldMoveOutSpeculatable --> isMemoryEffectFree && isSpeculatable
  • shouldMoveOutMemoryEffects --> !mayHaveMemoryEffectConflict && ( !hasSpeculatableOpInterface || isSpeculatable)

if either is true, the op can be moved.

I rebased so I can use getStaticTripCount() and I'm in the process of adding dead loop checking along with the above changes. Let me know if this is on the right track.

Regarding use-case, I still have to clean-up the test-cases so I'll add one that's as close to my own use-case as possible.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the latest changes addressed all your comments. looking forward to any/all feedback

return true;
}
return false;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I coincidentally worked on trip count this weekend: #158679

This is super tricky to get right when taking overflow into account (your logic is likely not correct for all possible cases here: just think that you're not accounting for unsigned flagged scf.for for example).

So I suggest rebasing on my PR after it lands, and use getStaticTripCount() == 0 :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic! Will rebase when it's available. Accounting for strange cases was one of my main worries after you pointed out the unsigned flag issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It landed FYI

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used it in addition to isZeroTrip() to confirm liveness

}
}

numMovedTotal += numMoved;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
numMovedTotal += numMoved;
numMovedTotal += numMoved;
LDBG() << "Finishing LICM iteration " << iteration++ << " moved " << numMoved << " ops, total is now " << numMovedTotal;

(need to add int iteration = 0; before the loop as well)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great suggestion: added

[&](Operation *op, Region *) {
return isMemoryEffectFree(op) && isSpeculatable(op);
},
[&](Operation *op, Region *) { return isSpeculatable(op); },
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit weird as it looks like a legality criteria but the callback is "shouldMove..." which looks like a profitability criteria instead.

I would think that the two are orthogonal.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the current check in the loop is rely on something being always speculatable, while I would think we should be able to improve it to move non-speculatable operations out of the loop when we know it'll execute at least once?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial understanding based on how the pass was written was that the op needs to be both speculatable and free of memory effects/conflicts... But, as I type this, it just clicked that the old LICM was centred entirely around conditional speculability and the mem effect check was a sub-condition for that 🤦

So there should be 2 pathways to moving an op out. if either is true, op is movable:

  1. isMemoryEffectFree(op) && isSpeculatable(op)
  2. !mayHaveMemoryEffectConflict(op, ...) && opDoesNotHaveConditionalSpeculabilityInterface

The reason for opDoesNotHaveConditionalSpeculabilityInterface is that if the interface is there, if the op speculatable, based on its definition, it shouldn't have side-effects and, if it's NotSpeculatable, then it may have undefined behaviour and shouldn't be moved.

Is that sufficient or do we need to check for anything else?

def TestEffectsReadAWriteB : TEST_Op<"test_effects_read_A_write_B",
[MemoryEffects<[MemRead<TestResourceA>,
MemWrite<TestResourceB>]>,
AlwaysSpeculatable]>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slightly confused by the notion of a write that can be speculated, can you clarify why this is OK?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

major misunderstanding on my part from how the pass was originally structured when I first started tinkering with it.

Speculatable trait has been removed from all of the new test ops

@mbagherbeikTT
Copy link
Author

mbagherbeikTT commented Oct 8, 2025

Revised how pass works, can now move code under 2 separate conditions:

  • speculability -> works same as before
  • memory effects —> requires loop to be LIVE based on current available methods

Other changes:

  • read before writes within the same op but on separate resources are now allowed if the read resource is not in conflict
  • test cases cleaned and provide more context
  • cleaned-up debug prints


auto condSpecInterface = dyn_cast<ConditionallySpeculatable>(op);

// if op implements ConditionallySpeculatable interface, must be speculatable!
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFC: I put this in as a pre-caution

A more conservative approach which would make more sense (since speculatable ops aren't meant to have side-effects) is to simply return true if the interface is present

// free.
// A potential solution is to recursively gather all resources on all
// contained ops and then run the for-loop further below. Requires discussions
// re: obscure corner cases.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFC: when an op reaches this part of the method, would it be correct to add an if statement on line 162 where it will select getEffects()/getEffectsRecursively() based on if it HasRecursiveMemoryEffects and move the op and it's nested regions?

@mbagherbeikTT
Copy link
Author

@joker-eph Sorry for the long delay. I finally got around to making the requested changes and cleaning up the code.

I think splitting up the movability checks into 2, 1 for speculability interface and 1 for memory effects for Live Loops, addresses the main concern from last time. I hope the fixed test-cases provide are more clear now and provide more value in terms of showing users how the pass is meant to work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants